Discussion:
i2c address probing and AT24RF08 corruption
Jean Delvare
2004-04-16 13:47:39 UTC
Permalink
Hi all,

Frodo, please do not forward this mail to the lm_sensors mailing list, I
really want you to read this!

I am starting this thread on the lm_sensors mailing-list with all of you
in copy in the hope to put an end to the Thinkpad issue that has been
plaguing our project since it started, more than five years ago.

In i2c-core as well as in our userspace tools (sensors-detect,
i2cdetect), we are using the SMBus "quick command"
(I2C_FUNC_SMBUS_QUICK) for i2c address probing purposes. This is what
is causing trouble with the AT24RF08 chip found in most Thinkpad
laptops, among other systems.

Our README.thinkpad document states:

"'Write Quick 0' is not an actual write and should never generate a
write operation to an eeprom location."

This is not exact. It should not generate a write operation to an eeprom
location, granted. But this *is* an actual write. Quoting the SMBus
spec:

"Here, part of the slave address denotes the command ? the R/W# bit. The
R/W# bit may be used to simply turn a device function on or off, or
enable/disable a low-power standby mode."

So it clearly states that any chip is entitled to change its state or
whatever it pleases when receiving a "quick command". After reading
this, I cannot blame Atmel for poor hardware design on the AT24RF08
anymore. What's really poor is our use of the "quick command" IMHO.

For this reason, I can hardly understand how it was decided that this
command would be used for address probing purposes. I agree that the
SMBus spec has no provision for address probing, so we have to use
another command to achieve this purpose, still I would expect a
"receive byte" (I2C_FUNC_SMBUS_READ_BYTE) to be way safer than a "quick
command". The only side effect I can think off is that it would
increase the address pointer for chips in autoincrement mode, but
that's about all and it doesn't seem to hurt, really. Another drawback
is that it requires 20 clock cycles instead of 11, but this is
definitely neglectable if it instantly solves the AT24RF08 corruption
issue and possibly other potential problems as well.

The problem is that I can hardly believe that nobody proposed the same
as I am doing right now in 5 years, although I couldn't find any
explanation as to why the "quick command" choice was made in the first
place. The only relevant post (unanswered) I could find is here:
http://archives.andrew.net.au/lm-sensors/msg00465.html

I am seriously planning to change this in all three places (i2c-core,
sensors-detect and i2cdetect) and would like to hear objections if they
exist.

Joe, Mark, you have been testing AT24RF08 chips in the past, could you
tell me if the corruption would also occur if we were sending a bit "1"
(instead of 0) in the "quick command"? I don't plan to do this (no real
reason it would be safer than sending a "0", except that it then looks
like the beginning of a read command instead of a write one) but if it
turns out that using a "receive byte" command for probing is not
possible, then this is another change to consider.

Everyone please speak up and give me your opinion on my proposal. I am
kind of tired of all the overhead work needed just because of the
Thinkpad problem and would like to definitely solve that issue as soon
as possible.

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Gerd Knorr
2004-04-16 14:45:25 UTC
Permalink
Hi,
Post by Jean Delvare
For this reason, I can hardly understand how it was decided that this
command would be used for address probing purposes.
Didn't noticed that this happens, never digged that deep in the smbus
code ...
Post by Jean Delvare
I agree that the SMBus spec has no provision for address probing, so
we have to use another command to achieve this purpose, still I would
expect a "receive byte" (I2C_FUNC_SMBUS_READ_BYTE) to be way safer
than a "quick command".
At least for the i2c-based video4linux stuff that should work without
problems. bttv actually uses reads for address probing.
Post by Jean Delvare
The only side effect I can think off is that it would increase the
address pointer for chips in autoincrement mode, but that's about all
and it doesn't seem to hurt, really.
At least with i2c you can even do zero-length reads, i.e. just send the
read address and watch whenever someone acks it or not, without actually
reading data. Not sure whenever that is true for smbus as well.
Post by Jean Delvare
I am seriously planning to change this in all three places (i2c-core,
sensors-detect and i2cdetect) and would like to hear objections if they
exist.
/me votes for doing that.

Gerd
--
http://bigendian.bytesex.org
Jean Delvare
2004-04-16 15:00:29 UTC
Permalink
Post by Gerd Knorr
Post by Jean Delvare
The only side effect I can think off is that it would increase the
address pointer for chips in autoincrement mode, but that's about
all and it doesn't seem to hurt, really.
At least with i2c you can even do zero-length reads, i.e. just send
the read address and watch whenever someone acks it or not, without
actually reading data. Not sure whenever that is true for smbus as
well.
This is exactly a SMBus "quick command" with data bit 1, if I understood
it correctly. This isn't what we do (we do "quick command" with data
bit 0), but that's my second choice if it turns out that "receive byte"
isn't possible as a probing command.

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Gerd Knorr
2004-04-16 15:54:39 UTC
Permalink
Post by Jean Delvare
Post by Gerd Knorr
At least with i2c you can even do zero-length reads, i.e. just send
the read address and watch whenever someone acks it or not, without
actually reading data. Not sure whenever that is true for smbus as
well.
This is exactly a SMBus "quick command" with data bit 1, if I understood
it correctly. This isn't what we do (we do "quick command" with data
bit 0), but that's my second choice if it turns out that "receive byte"
isn't possible as a probing command.
I'd prefeare a reading "quick command" over a "receive byte" as we don't
actually want read a byte on probing ...

Gerd
--
http://bigendian.bytesex.org
Jean Delvare
2004-04-16 17:04:55 UTC
Permalink
Post by Gerd Knorr
Post by Jean Delvare
This is exactly a SMBus "quick command" with data bit 1, if I
understood it correctly. This isn't what we do (we do "quick
command" with data bit 0), but that's my second choice if it turns
out that "receive byte" isn't possible as a probing command.
I'd prefeare a reading "quick command" over a "receive byte" as we
don't actually want read a byte on probing ...
There is no such thing as a "reading quick command", that's the problem.
In the SMBus spec, the "quick command" is described as sending a single
but to the chip (so the R/W but loses its R/W status on this one). So
you write either 0 or 1, but you write it.

I think it's absolutely silly from the SMBus folks to have done that,
since it's against the spirit of I2C (I would even say it makes SMBus
incompatible with I2C), and dangerous, and useless, but that's how it
is.

That said, after reading this:
http://archives.andrew.net.au/lm-sensors/msg01143.html
I understand that this is not the "quick command" itself corrupting the
24RF08 but the fact that its state machine is broken and it really sees
the "quick command 0" as the first byte of a write.

I wish I had read that before sending my proposal to everyone, but hey
that's how life goes.

Anyway, what you say here, and the facts on the 24RF08, confirm my
impression that a quick write 1 would be less dangerous than a quick
write 0. OTOH I am told that the former is likely to make some chips
lock the bus. So we're not only fighting against the 24RF08's broken
state machine, but also with some other chips (don't even have the
complete list). Damn, I hate it when broken hardware forces us to write
broken software to workaround :(

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Philip Edelbrock
2004-04-16 17:34:56 UTC
Permalink
(A couple comments in line below)
Post by Jean Delvare
Post by Gerd Knorr
Post by Jean Delvare
This is exactly a SMBus "quick command" with data bit 1, if I
understood it correctly. This isn't what we do (we do "quick
command" with data bit 0), but that's my second choice if it turns
out that "receive byte" isn't possible as a probing command.
I'd prefeare a reading "quick command" over a "receive byte" as we
don't actually want read a byte on probing ...
There is no such thing as a "reading quick command", that's the problem.
In the SMBus spec, the "quick command" is described as sending a single
but to the chip (so the R/W but loses its R/W status on this one). So
you write either 0 or 1, but you write it.
Correct. However, it might be seen as the equivelent to an aborted read
operation, depending on how the client chip treats it. There are very
few chips that use a quick read/write for anything practical. Of those
that do, I think they are mostly simple switches.
Post by Jean Delvare
I think it's absolutely silly from the SMBus folks to have done that,
since it's against the spirit of I2C (I would even say it makes SMBus
incompatible with I2C), and dangerous, and useless, but that's how it
is.
http://archives.andrew.net.au/lm-sensors/msg01143.html
I understand that this is not the "quick command" itself corrupting the
24RF08 but the fact that its state machine is broken and it really sees
the "quick command 0" as the first byte of a write.
Exactly. A 'quick read/write' is the closest thing to sending an
address, waiting for an ack, and then sending a stop. As you mention,
the state machine in the AT24RF08 doesn't respond to the stop in this
case, so it keeps listening for bytes like those directed to other chips
and constructs in it's state machine an unintended data-byte-write (or
something similar).
Post by Jean Delvare
I wish I had read that before sending my proposal to everyone, but hey
that's how life goes.
Anyway, what you say here, and the facts on the 24RF08, confirm my
impression that a quick write 1 would be less dangerous than a quick
write 0. OTOH I am told that the former is likely to make some chips
lock the bus. So we're not only fighting against the 24RF08's broken
state machine, but also with some other chips (don't even have the
complete list). Damn, I hate it when broken hardware forces us to write
broken software to workaround :(
Unfortunately, probing and detection is something we have just sort of
'made up' since there is nothing nice and defined in the way we can see
what's on a bus. It would be great to have some table like PCI does to
see the vendor id, device id, etc. But I2C/SMBus is, by design, is much
simplier than that.


Phil
Jean Delvare
2004-04-17 13:48:42 UTC
Permalink
Post by Philip Edelbrock
Unfortunately, probing and detection is something we have just sort
of 'made up' since there is nothing nice and defined in the way we
can see what's on a bus. It would be great to have some table like
PCI does to see the vendor id, device id, etc. But I2C/SMBus is,
by design, is much simplier than that.
Correct.

OK, thanks everyone for the feedback. Here is a summary of my knowledge,
and a new, more realistic (hopefully) proposal.

1* Some chips are write-only, and will lock the bus on read attempts.
Such devices include clock chips frequently found in PCs at address
0x69. Thus probing all addresses with a "receive byte" (instead of
"quick write 0" for now) or even "quick read 1" isn't an option.

2* The AT24RF08 found in various IBM machines and possibly others has a
broken state machine. Corruption is triggered by sending "quick write 0"
to addresses 0x54-0x57 and possibly 0x5C. We worked around it by
doubling the "quick write 0", it works on most cases but isn't safe
because something could intervene inbetween these two commands.
Especially in the kernel, the first quick write is done by i2c-core and
the second is under the responsability of the chip driver. This isn't
safe, to say the least. If you take a look at the eeprom driver (both
2.4 and 2.6), you'll notice that this second quick write is issued
conditionally. It's probably wrong and I'm certain that corruptions
could still occur (without the extra IBM-detection stuff, it would). So
I don't consider it a solution.

3* Quick write is a write command in SMBus, some chips could change
states upon receiving it. It was never seen so far, but could happen. I
underline this in order to insist on the fact that our use of "quick
write 0" for probing purposes has no legitimacy. If just happens to be
the less worst solution in a majority of cases.

Now the solution I have come to is the following. It's rather simple and
I wonder why nobody proposed that yet. Once again, I expect that some of
you will explain to me why it isn't possible (I was expecting this for
my first proposal and bingo! I won ;))

My point is that there is no reason to stick to a single method to
detect devices at all addresses.

We pretty well know where the AT24RF08 lives. Why not just use "receive
byte" commands to probe these addresses, instead of a "quick write 0"
and then hurrying to send another before a corruption occurs? I've
checked, and these addresses (0x54-0x57 and 0x5C) are only used by
EEPROMs as far as we know, which will all acknoledge on a "receive byte"
command.

I would even extend the range a little bit more. Many EEPROMs have
additional addresses in the 0x30-0x37 range (typically their primary
address - 0x20), which is often refered to as "shadow eeprom". This is
actually something similar to AT24RF08's 0x5C: page protection address.
I read several datasheets this morning with Rudolf Marek, and we found
that some EEPROMs even use the 0x30-0x37 range for their primary
address. Some older EEPROMs have primary addresses at 0x58-0x5F. And
some EEPROMs could write protect themselves definitely on any write to
their page protection address (unconfirmed).

All in all I wouldn't be against using "receive byte" commands to probe
the whole 0x30-0x3F and 0x50-0x5F ranges. We know little non-EEPROM
chips living in these ranges:

LTC1710: 0x58-0x5A.
The docs say that the chip will not acknowledge on reads. I'd like to
know if it just keeps quiet or locks the bus. Phil? This is an uncommon
chip if I'm not mistaking, unlikely to be found in PCs. People who have
this probably know they do and where it is, and can use a force
parameter.

SAA1064: 0x38-0x3B
MAX6900: 0x50
These two ones will acknoledge on reads, no problem.

So I propose that we agree on which ranges should be affected by the
change, and just do that in sensors-detect, i2cdetect and i2c-core.
After that we have to remove the extra quick writes in eeprom and
ddcmon, and later the IBM detection in sensors-detect and i2c-piix4.

I have been testing with the following ranges in sensors-detect:
0x30-0x37, 0x3C-0x3F, 0x50-0x57 and 0x5C-0x5F on my two systems (which
include EEPROMs on-board and over DDC, and other chips) and it worked
OK. This set of ranges is only a proposal, I have no strong objection
against a more extended range set (0x30-0x3F and 0x50-0x5F for example)
or a shorter one (as long as at least all five addresses of the AT24RF08
are included).

Objections, comments, anyone?

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Michael Hunold
2004-04-19 14:10:22 UTC
Permalink
Hello,
Post by Jean Delvare
0x30-0x37, 0x3C-0x3F, 0x50-0x57 and 0x5C-0x5F on my two systems (which
include EEPROMs on-board and over DDC, and other chips) and it worked
OK. This set of ranges is only a proposal, I have no strong objection
against a more extended range set (0x30-0x3F and 0x50-0x5F for example)
or a shorter one (as long as at least all five addresses of the AT24RF08
are included).
Objections, comments, anyone?
Just a side note as you are talking about address ranges and the like
here: Please keep in mind that there is a world outside smbus which
also uses i2c and where your assumtions are not nessesarely correct even
if they are for smbus.
I've added the .class bitfield during 2.5 to help avoid probing problems.
The basic idea is to classify the busses be able to limit probing to
busses where it makes sense to do that. The tuner.o module for example
will simply ignore any i2c adapter which hasn't the
I2C_ADAP_CLASS_TV_ANALOG bit set in .class because it doesn't make much
sense to look for a tv card tuner somewhere else. It also doesn't make
much sense to check for sensor chips on a TV card ;)
Don't know how widely this new .class bitfield is used outsite the v4l
world, but it is a good idea to use it. Reportly some DVB card
frontends are reacting very sensitive to the lm_sensor probes. Right
now dvb uses a separate i2c system so it isn't a big issue right now,
but very likely the dvb will switch over to the kernel's i2c subsystem
soon.
Correct. I recently presented a patch which added a .class member to the
client struct and a new flag to the adapter, so the i2c-core can check
for matching .class fields (if the .class field has been set in the
client) if that flag is present.

This solves the problem, where the i2c client doesn't check the .class
field of the adapter (most non-v4l clients out there), without affecting
backward compatibility.

The DVB system will switch to kernel i2c soon, and we need this
behaviour to make sure we have an isolated i2c adapter with only DVB i2c
clients.

Later it was discussed to clear this issue once and for all and make a
.class entry mandatory for i2c clients (this breaking compability).

Although I agree that this makes really sense, I still vote for the
simple "extent client structure + add a new flag" solution to get things
going.

I'm going to present this patch once more and start the discussion again.
Gerd
CU
Michael.
Gerd Knorr
2004-04-19 14:13:07 UTC
Permalink
Post by Jean Delvare
0x30-0x37, 0x3C-0x3F, 0x50-0x57 and 0x5C-0x5F on my two systems (which
include EEPROMs on-board and over DDC, and other chips) and it worked
OK. This set of ranges is only a proposal, I have no strong objection
against a more extended range set (0x30-0x3F and 0x50-0x5F for example)
or a shorter one (as long as at least all five addresses of the AT24RF08
are included).
Objections, comments, anyone?
Just a side note as you are talking about address ranges and the like
here: Please keep in mind that there is a world outside smbus which
also uses i2c and where your assumtions are not nessesarely correct even
if they are for smbus.

I've added the .class bitfield during 2.5 to help avoid probing problems.
The basic idea is to classify the busses be able to limit probing to
busses where it makes sense to do that. The tuner.o module for example
will simply ignore any i2c adapter which hasn't the
I2C_ADAP_CLASS_TV_ANALOG bit set in .class because it doesn't make much
sense to look for a tv card tuner somewhere else. It also doesn't make
much sense to check for sensor chips on a TV card ;)

Don't know how widely this new .class bitfield is used outsite the v4l
world, but it is a good idea to use it. Reportly some DVB card
frontends are reacting very sensitive to the lm_sensor probes. Right
now dvb uses a separate i2c system so it isn't a big issue right now,
but very likely the dvb will switch over to the kernel's i2c subsystem
soon.

Gerd
--
http://bigendian.bytesex.org
Jean Delvare
2004-04-19 15:50:39 UTC
Permalink
Just a side note as you are talking about address ranges and the
like here: Please keep in mind that there is a world outside smbus
which also uses i2c and where your assumtions are not nessesarely
correct even if they are for smbus.
I do know that and keep it in mind. However, I obviously don't know the
audio and video side of I2C as good as you do. The non-smbus chips I
thought about are the ones which are part of the lm_sensors project
(A/D and D/A converters, multiplexers, lcd displays, switches).

Do you know of any chip on you side using an address in the 0x30-0x3F or
0x50-0x5F ranges? If there are, I would need to know what they are and
how much they like quick writes and byte reads, so that I can refine my
changes.

As a side node, I don't expect people to run sensors-detect on
video-dedicated i2c busses...
Reportly some DVB card frontends are reacting very sensitive to the
lm_sensor probes.
.but I am obviously wrong ;) Any chance you could provide more details
about how bad it goes? Maybe the change I am planning to do could
improve this too.

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Gerd Knorr
2004-04-19 16:33:54 UTC
Permalink
Post by Jean Delvare
I do know that and keep it in mind. However, I obviously don't know the
audio and video side of I2C as good as you do. The non-smbus chips I
thought about are the ones which are part of the lm_sensors project
(A/D and D/A converters, multiplexers, lcd displays, switches).
Do you know of any chip on you side using an address in the 0x30-0x3F or
0x50-0x5F ranges? If there are, I would need to know what they are and
how much they like quick writes and byte reads, so that I can refine my
changes.
As a side node, I don't expect people to run sensors-detect on
video-dedicated i2c busses...
That is only one part of the problem. The sensor modules itself do
probing as well. So if sensors-detect found a sensor on your smbus
and adds the driver module to the config, that driver module will
look for its sensor on all i2c adapters registered in the system,
i.e. also on TV cards.

Rather than mucking with address lists I'd prefeare to export the .class
field to userspace somehow (ioctl? sysfs?) so sensors-detect can see and
use it as well. That avoids any address clashes in the first place.

Gerd
--
http://bigendian.bytesex.org
Jean Delvare
2004-04-19 17:12:50 UTC
Permalink
Post by Gerd Knorr
Post by Jean Delvare
As a side node, I don't expect people to run sensors-detect on
video-dedicated i2c busses...
That is only one part of the problem. The sensor modules itself do
probing as well. So if sensors-detect found a sensor on your smbus
and adds the driver module to the config, that driver module will
look for its sensor on all i2c adapters registered in the system,
i.e. also on TV cards.
Agreed, but this is a different problem IMHO. Michael Hunold is working
on extending the .class usage in the kernel. I think it's a good idea, I
was just waiting for Greg to give us his opinion on whether we should
use an extra flag or make the check mandatory.
Post by Gerd Knorr
Rather than mucking with address lists I'd prefeare to export the
.class field to userspace somehow (ioctl? sysfs?) so sensors-detect
can see and use it as well. That avoids any address clashes in the
first place.
Sysfs. Yes, that's a good idea as well, and was already discussed on the
LKML. But let's handle each problem in its time.

My proposal of address ranges is admitedly a trick, just meant to
prevent about any form of eeprom corruption (although I of course put
the stress on the AT24RF08 because we know how bad it is). It is not
meant as a solution to prevent any kind of bus or video chips hangs. For
this, the .class selection mechanism is what we want. Also note that the
.class thing alone would not solve the eeprom corruption issues, since
they usually live on the same bus as the hardware monitoring chips.

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Gerd Knorr
2004-04-19 19:13:04 UTC
Permalink
Post by Jean Delvare
My proposal of address ranges is admitedly a trick, just meant to
prevent about any form of eeprom corruption (although I of course put
the stress on the AT24RF08 because we know how bad it is). It is not
meant as a solution to prevent any kind of bus or video chips hangs. For
this, the .class selection mechanism is what we want. Also note that the
.class thing alone would not solve the eeprom corruption issues, since
they usually live on the same bus as the hardware monitoring chips.
Yup, .class can not fully replace the address range lists, but it should
reduce the number of cases where the address lists are needed and thus
simplify maintaining them ;)

Analog TV cards usually have a have eeprom at 0xa0, with PCI Subsystem
ID and sometimes config info, serial number and such stuff ...
Everything else isn't critical, at least I'm not aware of any touble
cases in the analog TV world.

Gerd
--
http://bigendian.bytesex.org
Philip Edelbrock
2004-04-19 17:39:55 UTC
Permalink
Post by Jean Delvare
LTC1710: 0x58-0x5A.
The docs say that the chip will not acknowledge on reads. I'd like to
know if it just keeps quiet or locks the bus. Phil? This is an uncommon
chip if I'm not mistaking, unlikely to be found in PCs. People who have
this probably know they do and where it is, and can use a force
parameter.
BTW- I don't have a clue what this chip does when read. I don't think
it locks anything up or does anything bad, but it's been a few years
since I've played with one. And, yes, it's probably very, very rarely
used. The driver was written out of curiousity, not user demand.


Phil
Eugene Surovegin
2004-04-16 20:05:36 UTC
Permalink
Post by Jean Delvare
For this reason, I can hardly understand how it was decided that this
command would be used for address probing purposes. I agree that the
SMBus spec has no provision for address probing, so we have to use
another command to achieve this purpose, still I would expect a
"receive byte" (I2C_FUNC_SMBUS_READ_BYTE) to be way safer than a "quick
command". The only side effect I can think off is that it would
increase the address pointer for chips in autoincrement mode, but
that's about all and it doesn't seem to hurt, really.
In recent discussion about IBM IIC I mentioned that I had a hw were 1-byte read
scan locked the bus.

I just rechecked IBM PPC 440GP ref platform "Ebony" and indeed it has an i2c
device which behaves this way.

According to Ebony manual it's a CDC850 clock buffer (i2c address 0x69). To
recover the i2c bus I have to reset the whole board, not just I2C controller.

Just my $.02

Eugene.
Jean Delvare
2004-04-16 21:01:49 UTC
Permalink
Post by Eugene Surovegin
In recent discussion about IBM IIC I mentioned that I had a hw were
1-byte read scan locked the bus.
I just rechecked IBM PPC 440GP ref platform "Ebony" and indeed it has
an i2c device which behaves this way.
According to Ebony manual it's a CDC850 clock buffer (i2c address
0x69). To recover the i2c bus I have to reset the whole board, not
just I2C controller.
With "1-byte read" do you refer to SMBus' "quick command" with bit 1:
S Addr 1 [A] P
or "receive byte"?
S Addr Rd [A] [Data] NA P

Or would both lock it?
Post by Eugene Surovegin
Just my $.02
I just hope that everyone will contribute the same and we'll end up with
a bright shiny buck ;)

Thanks.
--
Jean Delvare
http://www.ensicaen.ismra.fr/~delvare/
Eugene Surovegin
2004-04-16 21:37:01 UTC
Permalink
Post by Jean Delvare
Post by Eugene Surovegin
In recent discussion about IBM IIC I mentioned that I had a hw were
1-byte read scan locked the bus.
I just rechecked IBM PPC 440GP ref platform "Ebony" and indeed it has
an i2c device which behaves this way.
According to Ebony manual it's a CDC850 clock buffer (i2c address
0x69). To recover the i2c bus I have to reset the whole board, not
just I2C controller.
S Addr 1 [A] P
or "receive byte"?
S Addr Rd [A] [Data] NA P
Receive byte of course (your second example), if you remember "quick command"
isn't implemented for IBM PPC 4xx IIC controller ... yet.

Eugene.
Loading...